Sentence Extraction As A Classification Task

نویسنده

  • Simone Teufel
چکیده

A useful first step m document summau-sation is the selection of a small number of 'meamngful' sentences from a larger text Kupiec et al (1995) describe tim as a clas-mficatlon task on the basis of a corpus of technical papers with summaries written by professional abstractors, their system ldent~fies those sentences m the text which also occur in the summary, and then acquires a model of the 'abstract-worthiness' of a sentence as a combination of a hmlted numbel of properties of that sentence We report on a rephcatlon of thin exper-nnent with different data summaries for our documents were not written by professional abstractors, but by the authors themselves Tins produced fewer allguable sentences to tram on We use alternative 'meaningful' sentences (selected by a human judge) as training and evaluation material , because tlns has advantages for the subsequent automatic generation of more flexible abstracts We quantitatively compare the two ¢hfferent strategies for training and evaluation (vm ahgnment vs human judgement), we also chscnss qualitative chf-ferences and consequences for the genera-tlon of abstracts 1 Introduction A useful first step m the automatic or semi-automatic generation of abstracts from source texts m the selection of a small number of 'meamngful' sentences from the source text To achieve tins, each sentence m the source text is scored according to some measure of importance, and the best-rated sentences are selected Thin results m collections of the N most 'meamngful' sentences, m the order m wlnch they appeared m the source text-we will call these excerpts An excerpt can be used to give readers an idea of what the longer text m about, or It can be used as input into a process to .produce a more coherent abstract It has been argued for almost 40 years that it m posmble to automatically create excerpts which meet bamc reformation compresmon needs (Luhn, 1958) Since then, different measurements for the importance of a sentence have been suggested, m particular stochastic measurements for the mgmficance of stressed the Importance of heuristics for the location of the candidate sentence m the source text (Baxen-dale, 1958) and for the occurrence of cue phrases Single heunstms tend to work well on documents that resemble each other m style and content For the more robust creation of excerpts, combinations of these heuristics can be used The eruclal question m how to combine the ¢hfferent heuristics In the past, the …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extraction of Drug-Drug Interaction from Literature through Detecting Linguistic-based Negation and Clause Dependency

Extracting biomedical relations such as drug-drug interaction (DDI) from text is an important task in biomedical NLP. Due to the large number of complex sentences in biomedical literature, researchers have employed some sentence simplification techniques to improve the performance of the relation extraction methods. However, due to difficulty of the task, there is no noteworthy improvement in t...

متن کامل

Improvement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination

Chemical Named Entity Recognition (NER) is the basic step for consequent information extraction tasks such as named entity resolution, drug-drug interaction discovery, extraction of the names of the molecules and their properties. Improvement in the performance of such systems may affects the quality of the subsequent tasks. Chemical text from which data for named entity recognition is extracte...

متن کامل

Information Extraction and Sentence Classification applied to Clinical Trial MEDLINE Abstracts

In this paper, firstly we report experimental results on applying information extraction (IE) methodology to the task of summarizing clinical trial design information in focus on “Compared Treatment”, “Endpoint” and “Patient Population” from clinical trial MEDLINE abstracts. From these results, we have come to see this problem as one that can be decomposed into a sentence classification subtask...

متن کامل

برچسب‌زنی نقش معنایی جملات فارسی با رویکرد یادگیری مبتنی بر حافظه

Abstract Extracting semantic roles is one of the major steps in representing text meaning. It refers to finding the semantic relations between a predicate and syntactic constituents in a sentence. In this paper we present a semantic role labeling system for Persian, using memory-based learning model and standard features. Our proposed system implements a two-phase architecture to first identify...

متن کامل

Massed/Distributed Sentence Writing: Post Tasks of Noticing Activity

The purpose of the study was to activate the passive lexical knowledge through noticing and to investigate the effect of sentence writing as the post task of noticing activity on strengthening the effect of noticing. Forty-two Iranian female adult upper-intermediate English students of a state university in 2 homogenous groups participated in noticing the lexical items whose production were not...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997